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Abstract: Prompt radiation emitted during accelerator operation poses a significant 
health risk, necessitating a thorough search and securing of hazardous areas prior to 
initiation. Currently, manual sweep methods are employed. However, the limitations of 
manual sweeps have become increasingly evident with the implementation of large- 
scale accelerators. By leveraging advancements in machine vision technology, the 
automatic identification of stranded personnel in controlled areas through camera 
imagery presents a viable solution for efficient search and security. Given the criticality 
of personal safety for stranded individuals, search and security processes must be 
sufficiently reliable. To ensure comprehensive coverage, 180° camera groups were 
strategically positioned on both sides of the accelerator tunnel to eliminate blind spots 
within the monitoring range. The YOLOV8 network model was modified to enable the 
detection of small targets, such as hands and feet, as well as larger targets formed by 
individuals near the cameras. Furthermore, the system incorporates a pedestrian 
recognition model that detects human body parts, and an information fusion strategy is 
used to integrate the detected head, hands, and feet with the identified pedestrians as a 
cohesive unit. This strategy enhanced the capability of the model to identify pedestrians 
obstructed by equipment, resulting in a notable improvement in the recall rate. 
Specifically, recall rates of 0.915 and 0.82 were obtained for Datasets 1 and 2, 
respectively. Although there was a slight decrease in accuracy, it aligned with the 


intended purpose of the search-and-secure software design. Experimental tests 


conducted within an accelerator tunnel demonstrated the effectiveness of this approach 
in achieving reliable recognition outcomes. 
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1 Introduction 

Prompt radiation is generated during the operation of particle accelerators. The 
prompt radiation area has a high energy level and exhibits intense radiation 
characteristics. Consequently, individuals within the controlled area are subjected to 
significant doses of radiation from the generated neutrons and gamma rays |. Thus, all 
personnel must be evacuated within the controlled area before initiating accelerator 
operations. 

Conventionally, skilled personnel are deployed to evacuate personnel from a 
controlled area. These proficient individuals enter a controlled area and meticulously 
inspect and evacuate the other personnel according to a pre-established sequence and 


location (7), 


This approach presents several issues: (1) With the continuous 
advancement of scientific technology, there is a noticeable trend towards the scaling up 
of accelerator installations. Accelerators measuring several kilometers or even tens of 
kilometers in length have already emerged, and plans are underway for accelerators 
approaching a length of 100 kilometers '**!. These large-scale accelerators correspond 
to a substantial expansion of the controlled area. Consequently, the traditional approach 
faces increasingly prominent drawbacks such as prolonged time consumption and low 
efficiency. (2) Large accelerators encompass a multitude of components and intricate 
structures, resulting in numerous blind spots within hazardous areas. These obstructions 
give rise to significant safety concerns, as they may conceal individuals who have not 
been found by proficient personnel. 

In light of these circumstances, we present a machine-vision-based intelligent 
search-and-secure technology as a solution. This technology leverages a camera group 
deployed in a hazardous area and a server with an identification program specifically 
designed to perform intelligent and rapid identification of stranded individuals within a 


tunnel. 


Owing to the multitude of equipment within the hazardous area of the accelerator, 
some of which have large dimensions, the line of sight of personnel responsible for 
searching and security may be obscured. Obstruction also remains a challenge in 
pedestrian target detection '°!. To address this challenge, Pang et al. introduced a 
strategy that utilizes masks to guide attention networks, enhancing the detection of 
obstructed pedestrians by emphasizing the visible parts of the human body and 
suppressing obscured areas "l. Zhang et al. proposed an OR-CNN (Occlusion Region 
Convolutional Neural Network) focusing on both loss and core ROI (Region of 
Interest) pooling operations in a two-stage detection process '*!, To address the 
complexities of pedestrian pose variability and mutual occlusion, Khan et al. proposed 
a novel perspective, asserting that human heads, which are less susceptible to 
obstructions, could serve as robust focal points for detection across diverse scales in 
intricate scenarios. Their innovative head detection system demonstrated highly 
promising results, encouraging the exploration of local detection techniques to identify 
obstructed pedestrians P1. Moreover, Chen et al. presented a comprehensive pedestrian 
detection methodology that integrated both head and full-body information through 
multi-feature fusion H". We drew inspiration from these methodologies by discerning 
the head, hands, and feet as subsets of a pedestrian’s body. Subsequently, we seamlessly 
integrated these subsets into the overarching pedestrian structure. This integration 
addresses the concern regarding the shielding of individuals stranded in a tunnel due to 
equipment obstruction. 

This study aimed to develop an intelligent monitoring system tailored for the sweep 
of an accelerator tunnel, encompassing considerations in both the hardware and 
software realms. On the hardware front, our primary emphasis was on devising and 
implementing a camera group that boasts expansive 180° horizontal and vertical field 
angles. Strategically positioned on both sides of the tunnel, these cameras adeptly 
alleviate the challenge of pedestrians encountering complete obstruction. On the 
software facet, we designed the Parts of the Human Body (PHB) model for pedestrian 
recognition. This model employs a comprehensive approach; covered pedestrians are 


identified by analyzing their heads, hands, and feet, and intelligent search and security 


software was designed. By seamlessly integrating the camera group with the PHB 
model, our system achieves a one-key intelligent clearing of the accelerator tunnel. 

The contributions of this study are as follows: (1) A novel machine-vision-based 
search-and-secure system is introduced, marking a pioneering approach ensuring the 
evacuation of all individuals from the tunnel before the accelerator activates. The core 
focus of this study is to tailor the system to suit the specifics of an accelerator tunnel 
environment. (2) To address the issue of accelerator occlusion, we introduced a novel 
design featuring a camera array consisting of six units that ensures comprehensive 
visual coverage, a dimension that has not been previously explored. (3) The YOLOv8 
model is enhanced by leveraging body part recognition to detect stranded personnel. 
This innovative approach significantly increased the recall rate. 
2 System Architecture 

The hazardous area of a large accelerator typically has a width of no more than 10 
m, a height of no more than 6 m, and a length ranging from several hundred meters to 
tens of kilometers. The primary accelerator equipment spans the length of the hazardous 
area, as illustrated in Figure 1. In our proposed intelligent search-and-secure system, 
we organized detection units at 15 m intervals, each equipped with a set of camera 
groups situated on both sides of the hazardous area, which were connected to both the 
regular video surveillance server and the intelligent video surveillance server. The 
regular video surveillance server is responsible for standard functionalities, such as 
real-time monitoring and video playback, whereas the intelligent video surveillance 
server incorporates a PHB recognition program specifically designed for intelligent 


sweep purposes. 
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Fig. 1 Architecture diagram of intelligent search-and-secure system 
3 Design of 180° Camera Group 
The main equipment of the accelerator comprises magnets, vacuum beam tubes, 


high-frequency cavities, beam detection equipment, and various equipment supports l!t- 


13] as illustrated in Figure 2. Smaller equipment, such as vacuum pipes, allows 
personnel to perform maintenance tasks in their proximity; however, the majority of 
the maintainer's body remains uncovered, enabling identification by cameras positioned 
on either side of the equipment. Conversely, larger equipment, such as magnets and 
high-frequency cavities, may require personnel to work on the upper and lateral sides. 
Personnel situated on the magnet side experienced significant body obstruction, 
rendering the camera's recognition effect ineffective or making them non-identifiable. 
Therefore, camera groups must be arranged on both sides of the magnet. The limited 
space beneath the magnet restricts the full entry of personnel bodies; however, body 
parts such as the head, hands, and feet are consistently identifiable. In the vertical range 
of the 3—5 m controlled area, cable bridges and ventilation pipes are typically installed 
along the walls, with cranes positioned at the top. Personnel may approach these areas 
for maintenance, which makes it crucial for the camera to have visibility of these 
individuals. In the longitudinal direction of the controlled area, the monitoring distance 


of the camera must be maximized. Simultaneously, it is crucial to ensure that the camera 


can effectively monitor the body parts of individuals in close proximity. 
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Fig. 2 Primary components of the particle accelerator 

Based on the aforementioned analysis, it can be deduced that to prevent personnel 
from being overlooked because of the camera's blind spot, the vertical and horizontal 
field angles of the intelligent search-and-secure system cameras must be close to 180°. 
The field-of-view angle of a single camera fails to satisfy this criterion, necessitating a 
combination of multiple cameras to form a camera group "41, 

The camera group was affixed to a wall situated below the cable bridge and the 
ventilation duct. Simultaneously, larger equipment within the accelerator tunnel, such 
as magnets, can extend up to a height of approximately 2 m. The installation of a camera 
group at an approximate height of 2 m is recommended to reduce potential obstructions. 
The imaging size of a single camera with a 1/2.7" CMOS sensor was 5.27 mm x 3.96 
mm (w x h). A smaller lens was used to achieve the widest field of view possible. By 
selecting a 2.8 mm lens, the following formula yields a horizontal field angle of 86.5° 


and a vertical field angle of 70.5°. 
Horizontal Field Angle: æ = 2arttan(w/2/) (1) 
Vertical Field Angle: J = 2arttan(h/2/) (2) 


where w represents the width of the field of view, A represents the height of the field 


of view, and f denotes the focal length of the lens"), 


As illustrated in Figure 3, a camera group consisting of six cameras covers a 
vertical viewing field of 180° and a horizontal viewing field of 173°. The angular 
separation between cameras | and 4 was measured to be 86.5°. Similarly, the angular 
separation between cameras 2 and 3 is 109.5°, with the latter pair positioned above and 
below camera 1. Correspondingly, cameras 5 and 6 were positioned above and below 


camera 4, respectively, with an angular separation of 109.5°. 
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Fig. 3 Structure diagram of the 180° camera group 


The optimal viewing range of the 2.8 mm lens was limited to a maximum of 7.5 m. 
Meanwhile, owing to the camera group's horizontal viewing angle of 173°, a blind field 
of view measuring 0.46 m will be present near the wall when the distance exceeds 7.5 
m. Taking into account the fact that the actual field of view might slightly exceed the 
calculated field of view, we conducted a verification test and determined that a distance 
of 15 m between camera groups would be appropriate. This distance ensured adequate 
coverage and minimized the occurrence of blind spots in the monitoring area. 

4 Design of Intelligent Search-and-secure Software 

The workflow of the intelligent search-and-secure software is illustrated in Figure 

4. After a sweep of the hazardous areas is initiated by the computer monitoring platform 


in the control room, the detained personnel identification program is activated. 


Simultaneously, all detection units within the corresponding hazardous area begin 
capturing continuous videos for a duration of 3 min. Subsequently, the captured images 
are segmented, enlarged, and enhanced. A PHB recognition model is employed to 
determine whether an individual is present in the captured images. If no person is 
detected, the intelligent search-and-secure server sends a signal indicating that the 
hazardous area has been searched and secured. However, if stranded personnel are 
identified in the images, the monitoring platform displays the corresponding images, 


allowing on-duty personnel to confirm or initiate rescan procedures. 
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Fig. 4 Working flow chart of intelligent search-and-secure software 


4.1 Design of Pedestrian Recognition Model with Fusion of Body Parts 

The primary objective of intelligent search and security software is to identify 
individuals trapped in hazardous areas through the analysis of images captured by 
cameras, with pedestrian target detection as its fundamental technology. With the rapid 
advancement of deep learning, technologies rooted in deep learning, such as image 


recognition and data processing, have also gained popularity in the nuclear technology 


[16,17 


domain l. Visual inspection technology leveraging deep learning is advancing 


rapidly. For instance, Tang et al. employed machine vision technology for the precise 


'8] Similarly, they utilized binocular vision methods to 


detection of crack widths ! 
accurately measure the deformation of concrete columns °. As a crucial subset of 
visual detection technology, pedestrian target detection has extensive applications in 
diverse fields such as autonomous driving, robotics, intelligent monitoring, and human 
behavior analysis 7°74, 

The search and security process within a hazardous area demands a high level of 
reliability, posing challenges for existing pedestrian detection technologies in scenarios 


where pedestrians are obstructed by equipment 773), 


Through a comprehensive 
analysis of the equipment layout and pedestrian occlusion within the hazardous area of 
the accelerator, we observed that certain body parts, such as the head, hands, and feet, 
were less likely to be completely occluded when using our developed camera group 
arrangement on both sides of the hazardous area. Considering these characteristics, we 
propose a novel pedestrian recognition model that incorporates the distinctive features 
of different body parts, thereby enhancing the reliability of intelligent searches and 
secure systems. 

The PHB recognition model is an enhanced design based on the YOLOv8 network 
model, as depicted in Figure 5. The network model comprises three main components: 
feature extraction, detection, and header modules. 

The feature extraction module follows the YOLOv8 backbone network, which 
consists of five CBL layers that perform operations such as convolution and 
normalization on the input feature map. The module includes four C2f modules that 
facilitate learning of the residual characteristics. To improve the receptive field of the 
network, the spatial pyramid pooling fusion (SPPF) module performs feature extraction 
through a parallel input using multiple maximum pooling layers **5!. Building on the 
three detection layers of YOLOv8, the detection module introduced minimal and 
maximum target detection layers. The minimal target detection layer focuses on 
detecting small targets such as hands and feet. It processes the feature map after the 


14th layer of the original network and expands it. In the 21st layer, the resulting 160 x 


160 feature map is ConCat fused with the feature map from the second layer of the 
backbone network, enabling the detection of very small targets 7°77), 

By contrast, the maximum target detection layer addresses cases in which 
individuals approach the camera too closely, leading to super-large targets. It fuses the 
10 x 10 feature map obtained from the 11th layer of the original network with the 8th 
layer feature map of the backbone network to obtain the minimum feature map for 
detecting the maximum targets. After splicing and fusing the features from different 
layers, namely layers 22, 25, 28, 31, and 34, they are passed to the detection head. The 


detection head consists of five detector modules that output the prediction information. 


The final detection results are obtained by further calculations and comparisons. 


— Detection module ‘Header module; 
Vy 
li 


| 
a 


i 
t 


Q 


monr 


The feature extraction module 


i 


| 
| 
p 
= | 
kei 
= | 
| 


@®W - E = -(CBL)-+Maxpool)-»(Waxpool)-»(Waxpoo!) 
@-@anam- |_| | a 


Bottleneck = — CAdd> (Bottleneck2 )= CBL CBL 


CLD -GD -a D ~ 


C --- CGD) (Bort Tensckl)(GortTeneckT) EN 


(22D) -Split Bort aac) + GE)» 
GED - > (CBL\CBL Cony )-~- === Reg Pred 
— Don) Cls Pred 


(b) 
Fig. 5 Network architecture diagram of the PHB model. (a) PHB model main frame. (b) structure 


of submodules. 


In the PHB recognition model, the five detection layers corresponded to five sets 
of initial detection boxes. When the input image size was 640 x 640 pixels and the 
distance between the camera and hand target was 8 m, the size of the hand target was 
approximately 6 x 6 pixels. The minimal target detection layer has a size of 160 x 160 
pixels and is designed to detect minimal targets larger than 4 x 4 pixels, thus fulfilling 
the requirements for hand target detection P3! 

The small-target detection layer has a size of 80 x 80 pixels and is responsible for 
detecting ordinary small targets larger than 8 x 8 pixels. The detection layer 
corresponding to medium-sized targets measures 40 x 40 pixels and detects targets 
larger than 16 x 16 pixels. Similarly, the detection layer corresponding to large targets 
has a size of 20 x 20 pixels and can detect targets larger than 32 x 32 pixels. 

Additionally, a super-large target detection layer measuring 10 x 10 pixels aids in 
the identification of scenarios in which the large target detection layer encounters 


challenges in detecting the body occupying the entire image, as depicted in Figure 6. 


(a) (b) 
Fig. 6 Comparison of object recognition results for oversized targets. (a) unrecognized objects by 
YOLOv8s. (b) recognized objects by PHB. 


4.2 Information Fusion Strategy 

Khan et al. partitioned a broad spectrum of scales into a subscale ensemble 
encompassing three distinct scales. This segmentation enabled them to effectively 
process heads aligned with particular subscales. Subsequently, these components were 
amalgamated into an end-to-end network, yielding highly satisfactory detection 
outcomes '**!, Inspired by this methodology, our approach extends its concept to address 
blocked pedestrians. We treated the hands, head, and feet as individual subsets within 
the overall obstructed pedestrian category. Each subset was detected independently, and 
a fusion strategy was employed to assemble a comprehensive pedestrian detection 
framework after detecting these components separately. 


Let us consider the overall pedestrian detection box, denoted as box 


Brow = x, A x, b i x’, b x, b 
( 1 Ye % I) ghee the coordinates ( xi) and ( 7 y) represent the 


upper-left and lower-right points of the detection box, respectively. 
In accordance with the observations made in [10], the analysis considered different 


s B0, Tn this analysis, the 


pedestrian postures, including standing forward and sideway 
upper section of the pedestrian detection frame was designated as the head area, 
whereas the lower section represented the foot area. Given the flexible nature of hand 
positioning, the middle and upper regions of a pedestrian's body, along with both sides, 


are considered potential areas where hands may appear. The head, foot, and hand areas 


were calculated as follows: 


1 
Head _region =(x}, y}, x3, yf + zh’) (3) 


2 
Foot _ region =(x, y? + 3h oe y?) (4) 
1 
Hand _ region =(x? =)", yf, x +w’, y? - 3h) (5) 


where w’ represents the width of the overall pedestrian detection frame, and A’ 


represents the height of the overall pedestrian detection frame. 

In crowded scenarios, the body parts of other targets can appear within a pedestrian 
detection frame. To address this issue, a processing method that involves calculating 
the distance between a specific type of body part and the center of the target body part 
was employed. This calculation was performed when the number of body parts within 
the overall pedestrian detection frame exceeded the expected count. The nearest body 
part was then matched to the overall pedestrian detection frame. 

4.3 Search-and-secure Software and Interface Design 

This study used PyQt5 to design the software interface, as depicted in Figure 7. 
Upon initiating the search and securing process through the software button, the 
underlying program proceeds by capturing a screenshot for 3 min. The captured image 
is then sent to a designated folder for segmentation, followed by the automated 
execution of the PHB detection program. If a target is detected, the interface displays 
an image with annotations denoting the entire pedestrian or specific body parts within 
the scene. On-duty personnel are prompted to confirm or initiate rescanning procedures. 
In the absence of a detected target, the interface provides a signal indicating a successful 
sweep. 

In practical scenarios, an acceleration tunnel is divided into multiple smaller, 
controlled areas, each of which is scanned at distinct time intervals. Meanwhile, 
considering the gradual nature of human movements, we captured images at 30-second 
intervals for detection purposes. These measures are crucial for reducing the number of 


captured images and improving overall work efficiency. 
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Fig. 7 Intelligent search-and-secure software user interface 


The PHB system adopts the image input approach of YOLOv8, which involves 
resizing the image to a dimension of 640 x 640 pixels before feeding it into a detection 
model. However, the camera group outputs images with a size of 1920 x 1080 pixels. 
Direct scaling of these images results in a reduction in the number of target pixels, 
potentially affecting the detection performance for small targets. To mitigate this issue, 
the search-and-secure program employed in this study divided the original image into 
3 x 3 subgraphs. These subgraphs, along with the original image, were provided as 
inputs for the PHB program. 

5 Experimental Validation Results 
5.1 Construction of the Validation Dataset 

Dataset 1: The homemade human body part dataset comprised a collection of 3,998 
images extracted from scenes within an accelerator tunnel. This dataset encompasses 
more than 15,000 pedestrian targets. To diversify the dataset, the backgrounds 
surrounding each pedestrian in the selected images were captured randomly to 
introduce occlusions. Subsequently, the LabelImg tool was used for precise annotation. 
The annotations were categorized into four classes: person, head, hand, and foot. The 
annotated data were then converted from XML to YOLO format and split into training 


and validation sets, adhering to a ratio of 9:1 for effective model training and evaluation. 


Dataset 2: The pedestrian detection fusion dataset comprised a collection of 10,000 
images that were randomly sampled from prominent datasets such as COCO, VOC2012, 
VOC2017, SYSU, and PRW P", After a meticulous data cleaning process, the dataset 
was curated to extract images that specifically contained pedestrians. In total, 9,257 
images were obtained, encompassing a diverse range of scenarios involving occluded 
and unoccluded pedestrians, as well as varying distances between the pedestrians and 
the camera. The dataset was subsequently divided into training and verification sets at 
a ratio of 1:9. This partitioning scheme ensures an effective evaluation of both the PHB 
and classical models in terms of their generalization abilities across different pedestrian 
detection scenarios. 

5.2 Evaluation Metrics 

The detection and evaluation processes used in this study were divided into two 
main components. The first part focuses on pedestrian-component detection, in which 
the performance of the detection results is compared with those of YOLOvS5s and 
YOLOv8s. This comparison aimed to validate the impact of the introduced minimal 
target detection layer (frame) and maximum target detection layer (frame). The 
evaluation of the detection results was based on conventional metrics such as precision, 
recall, and mean average precision (mAP). The mAP is computed as the overall average 
value when the detection threshold ranges from 0.5 to 0.95, denoted as mAP0.5:0.95. 


The calculation formula is as follows: 


Precision = oe (6) 
TP + FP 
Recall = = (7) 
TP+ FN 
mAP = => AP(c) (8) 


ceC 
where TP represents cases in which the prediction is positive and aligns with the 
actual positive instances. FN denotes instances in which the prediction is negative 
but the actual value is positive. FP indicates cases where the prediction is positive 
yet the actual value is negative. 


The second part of the evaluation focused on the overall pedestrian detection 


performance. A comparison was made between the detection results obtained using the 
PHB model and classical models, such as the YOLO series and Faster R-CNN, aiming 
to assess the generalization ability of the PHB model. The evaluation metrics employed 
included precision, recall, and average precision (AP) P”, 
5.3 Experimental Setup and Results Analysis 
5.3.1 Experimental Setup and Parameter Configuration 

The experiments were conducted using a Windows 10 operating system with 
CUDA 11.1, and the training was performed on a single NVIDIA GeForce RTX 3070 
GPU. The input image size was set to 640 x 640 pixels, and the training process was 
performed for 300 epochs. Each training batch consisted of 16 images. The gradient 
descent optimizer utilized a momentum parameter of 0.937 and a weight decay 
regularization coefficient of 0.0005. The initial learning rate (Lr0) for training was set 
to 0.01. 
5.3.2 Detection Results and Analysis 

The training process for YOLOv5s was completed in approximately 10.4 h, 
whereas training with YOLOv8s took approximately 8.9 h and PHB took 
approximately 14 h. Despite the longer training time, PHB outperformed YOLOv5s 
and YOLOv8s in terms of accuracy, recall rate, and AP 33]. This improvement was 
particularly notable in the recall rate index of the search and security software, in which 
the overall recall rate for pedestrians increased by 0.158 (Table 1). The inclusion of the 
PHB model resulted in an increase in the number of detection layers, which affected 
the detection speed. However, considering the significance of reliability indicators for 
intelligent search-and-secure software, the tradeoff of computing time for improved 


reliability is deemed worthwhile. 


Table 1 Performance comparison PHB, YOLOvS5s and YOLOv8s 


Class Model Precision Recall Map0.5 Map0.5:0.95 
YOLOv5s 0.922 0.737 0.827 0.481 
Person YOLOv8s 0.938 0.715 0.821 0.488 
PHB 0.941 0.873 0.913 0.706 
YOLOv5s 0.97 0.928 0.966 0.714 
Head YOLOv8s 0.979 0.919 0.962 0.723 


PHB 0.979 0.929 0.969 0.76 


YOLOv5s 0.821 0.71 0.777 0.404 


Hand YOLOv8s 0.862 0.702 0.773 0.442 
PHB 0.873 0.756 0.818 0.475 

YOLOv5s 0.767 0.696 0.734 0.383 

Foot YOLOv8s 0.798 0.683 0.727 0.387 
PHB 0.805 0.715 0.763 0.412 


In the context of machine vision searches and secure software, the ability to 
accurately identify all stranded individuals is of paramount importance. However, upon 
analyzing the results presented in Table 1, while the PHB model shows an improvement 
in the overall recall rate of pedestrians, the achieved performance falls short of the 
desired ideal. 

Therefore, this study adopted a two-step approach to the process of information 
fusion. First, the PHB model was employed to detect the pedestrian body parts within 
the image. Then, the pedestrian body parts were considered a subset of the overall 
pedestrian and combined with the overall pedestrian bounding boxes. Specifically, for 
each overall pedestrian bounding box, the presence of the head, hand, and foot 
bounding boxes within the region was assessed. If these bounding boxes are identified, 
the component bounding box with the highest confidence score in that region is selected 
and paired with the entire pedestrian bounding box. In cases where the pedestrian 
bounding box has a low score but the body part component bounding box exhibits high 
confidence, the overall bounding box is retained. Additionally, if a component 
bounding box demonstrates high confidence but does not match the overall pedestrian 
bounding box, it is preserved and output as a pedestrian label. This approach aligns 
with our aim, as depicted in Figure 8, where the presence of the head, hands, feet, and 
other body parts indicates the presence of a pedestrian, even if the entire pedestrian is 


not fully visible. 


(a) (b) 


Fig. 8 Comparison of effects before and after information fusion. (a) pre-fusion recognition result. 


(b) post-fusion recognition result 


We conducted a comparative analysis of the YOLOvSs- and YOLOv8s-enhanced 
PHB models using Dataset 1. The results are presented in Table 2. Notably, the 
incorporation of information from other body parts led to a significant improvement in 
the recall rate of the YOLOvSs-PHB model. However, it is essential to acknowledge 
that the accuracy, as indicated in Table 1, of the overall and head recognition of 
pedestrians was somewhat diminished. This could be attributed to the influence of the 
recognition performance associated with other body parts. In contrast, the PHB model 
based on YOLOV8s exhibited a slightly reduced recall rate compared with its YOLOv5s 
counterpart. However, this compensates for the improved precision. Consequently, it is 


crucial to strike a balance between recall and accuracy. 


Table 2 PHB person class detection performance 


Model Precision Recall AP 
pre-fusion 0.924 0.874 0.916 
YOLOv5s-PHB 
post-fusion 0.878 0.921 0.914 
pre-fusion 0.941 0.873 0.913 
YOLOv8s-PHB 
post-fusion 0.896 0.915 0.911 


5.3.3 Comparison and Analysis of Classical Algorithms 

Upon implementation of the information fusion strategy, the PHB model 
demonstrated superior pedestrian recognition performance for Dataset 1 compared to 
YOLOvé8s. However, it is important to acknowledge the limitations stemming from the 
relatively small scale of Dataset 1. Thus, generalization experiments must be conducted 
on Dataset 2 to validate the generalization capabilities of PHB and assess its 


effectiveness in diverse scenarios. 


Under identical configuration conditions, the PHB-based intelligent search-and- 
secure algorithm was compared with the classical pedestrian target detection algorithm 
using Dataset 2. Table 3 presents the results of the study. Notably, despite being 
designed based on the smaller YOLOv8s model within the YOLOv8 series, PHB 
achieves the same precision as the larger YOLOv81 model. The recall rate demonstrated 
a 13.1% increase, whereas the average detection accuracy improved by 4.4%. 
Furthermore, when compared to Faster R-CNN, the PHB algorithm outperformed the 
other algorithms in terms of overall performance. However, the accuracy and recall 
rates of PHB in Dataset 2 were lower than those in Dataset 1. This discrepancy arises 
because, in the context of the sweep system, instances in which pedestrians are 
obstructed by other pedestrians are infrequent. Consequently, Dataset 1, which was 
used to train the PHB model, prioritizes interclass occlusion and may not effectively 
address the challenges posed by the severe intraclass occlusion encountered in Dataset 
2. In summary, the PHB-based intelligent search-and-secure algorithm guarantees high 
detection accuracy and a low missed detection rate, specifically in scenarios where 


pedestrians are obstructed by equipment. 


Table 3 Comparison of pedestrian detection performance 


Model Precision Recall AP 
YOLOv5s 0.822 0.698 0.79 
YOLOVS1 0.846 0.74 0.833 
YOLOv8s 0.861 0.687 0.793 
YOLOv81 0.867 0.689 0.798 

Faster RCNN 0.813 0.796 0.781 

PHB 0.869 0.82 0.842 


5.3.4 The Impact of Fusion Strategies in Classical Models 

The PHB model based on YOLOv8s was enhanced, followed by the 
implementation of an information fusion strategy to enhance model performance. 
Subsequently, this fusion strategy was directly applied to the classical model, and a 
comparative evaluation was conducted against the PHB effect. The results are 
summarized in Table 4. Notably, the SSD model demonstrated significantly inferior 
performance compared with the PHB model after fusion strategy adoption. Furthermore, 


the recall rate of the Faster RCNN surpasses that of the PHB effect after incorporating 


the fusion strategy. However, it is evident that a Faster RCNN also requires nearly twice 
the processing time of PHB. Considering the high volume of images processed by the 
search-and-secure system and the emphasis on real-time performance, the PHB model 


enhanced by the YOLOv8 model proved more suitable 5439), 


Table 4 Comparison of fusion strategies' impact on classical models 


Model Precision Recall AP 

pre-fusion 0.801 0.681 0.77 

SSD - 
post-fusion 0.753 0.78 0.759 
pre-fusion 0.813 0.796 0.781 

Faster RCNN i 
post-fusion 0.798 0.831 0.776 
PHB post-fusion 0.869 0.82 0.842 


Moreover, our investigation included a comparison with the classical model to 
evaluate the recognition performance between larger targets simulated by pedestrians 
approaching the camera and smaller targets, such as hands and feet. Our findings 
indicate that although the direct application of YOLOv8 exhibited limited effectiveness 
on smaller targets, our enhancements successfully mitigated this constraint. 
Consequently, the PHB model demonstrates proficiency analogous to that of the Faster 
RCNN in recognizing diminutive targets. However, the PHB model excelled at 
identifying significantly larger targets. 

6 Conclusion 

Based on the performance evaluation of the model, we installed two sets of 180° 
camera groups within a section of the China Spallation Neutron Source Accelerator 
Tunnel P% as shown in Figure 9. A relatively enclosed and controlled area was created 


by strategically introducing partial physical occlusion. 


Fig. 9 Photograph of the intelligent search-and-secure system deployed in the tunnel 


Several field tests were conducted within this controlled area, and the results 
demonstrated that the intelligent search and security system successfully detected 
stranded individuals and achieved notable outcomes. However, the tests revealed 
certain issues that require resolution. For instance, the system incorrectly identified 
body images within certain promotional photographs in the tunnel as pedestrian targets. 
These concerns will be addressed in the future as part of ongoing system enhancements. 

Machine-vision-based search-and-secure technology has considerable potential for 
broad applications in diverse settings such as railway yards, chemical plants, museums, 
and other intermittent hazardous areas 278), This technology has a significant value 
and merits further promotion and implementation. 
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